**Calculate Goldin quotients**
*First, load data containing singles

clear
use "Data\sample1_new.dta"

*keep only variables necessary for calculating Goldin quotients - everything else can be merged back on

keep pnr aar erhverv log_wage_new final_educ final_educ_year year1 year2 year3 year4 year5 year9 year10 year11 ten_wage_ambition start_wage_ambition individual old_ambition extreme_ambition grad_region

*Merge on ras_ftpt

merge 1:1 pnr aar using "Data\Core_datasets\ras_ftpt.dta"

drop if _merge==2
drop _merge

*Then calculate Golding quotients

*Based on years 9-11 combined
sort pnr aar
gen ftpt_9_10_11=ftpt_ras if final_educ!=. & final_educ!=1 & old_ambition==1 & extreme_ambition==0 & (aar==year9 | aar==year10 | aar==year11) /*Using ftpt from RAS*/
by pnr: egen temp=mean(ftpt_9_10_11)

sum ten_wage_ambition if individual==1, de
sca ten_p1=r(p1) 
sca ten_p99=r(p99) 

gen temp_pt=ten_wage_ambition if final_educ!=. & final_educ!=1 & individual==1 & old_ambition==1 & extreme_ambition==0 & temp>0 & temp<1 & ten_wage_ambition>=ten_p1 & ten_wage_ambition<=ten_p99 /*Excluding outliers below p1 or above p99*/
gen temp_ft=ten_wage_ambition if final_educ!=. & final_educ!=1 & individual==1 & old_ambition==1 & extreme_ambition==0 & temp==1 & ten_wage_ambition>=ten_p1 & ten_wage_ambition<=ten_p99

sort final_educ

gen temp_count_pt=1 if !missing(temp_pt)
gen temp_count_ft=1 if !missing(temp_ft)

by final_educ: egen temp_count_pt_2=sum(temp_count_pt)
by final_educ: egen temp_count_ft_2=sum(temp_count_ft)

gen temp_pt_2=temp_pt if temp_count_pt_2>=10 /*Must observe at least 10 per program to use information*/
gen temp_ft_2=temp_ft if temp_count_ft_2>=10

by final_educ: egen temp_pt_3=mean(temp_pt_2)
by final_educ: egen temp_ft_3=mean(temp_ft_2)

gen goldin_quotient=exp(temp_ft_3)/exp(temp_pt_3)

drop temp*
drop ftpt_9_10_11


***Impute values for outdated programs***

forvalues i=81(1)85{
	
*9th grade
gen temp_g=goldin_quotient if final_educ==1109`i'
egen temp_g_2=max(temp_g)
replace goldin_quotient=temp_g_2 if (final_educ==1007 | final_educ==1008 | final_educ==1023 | final_educ==1123 | final_educ==1009 | final_educ==1022) & grad_region==`i' 
drop temp_g temp_g_2

replace final_educ=1107`i' if final_educ==1107 & grad_region==`i'
replace final_educ=1008`i' if final_educ==1008 & grad_region==`i'
replace final_educ=1023`i' if final_educ==1023 & grad_region==`i'
replace final_educ=1123`i' if final_educ==1123 & grad_region==`i'
replace final_educ=1009`i' if final_educ==1009 & grad_region==`i'
replace final_educ=1022`i' if final_educ==1022 & grad_region==`i'

*10th grade
gen temp_g=goldin_quotient if final_educ==1110`i'
egen temp_g_2=max(temp_g)
replace goldin_quotient=temp_g_2 if final_educ==1010 & grad_region==`i' 
drop temp_g temp_g_2

replace final_educ=1010`i' if final_educ==1010 & grad_region==`i'

}

*3.g 
gen temp_g=goldin_quotient if final_educ==1198
egen temp_g_2=max(temp_g)
replace goldin_quotient=temp_g_2 if final_educ==1097
drop temp_g temp_g_2


****

*Standardize
sum goldin_quotient
sca the_mean_s=r(mean)
sca the_sd_s=r(sd)
gen goldin_quotient_s=(goldin_quotient-the_mean_s)/the_sd_s
sum goldin_quotient_s

**SAVE***
sort pnr aar
save "Data\Goldin_quotients_for_all.dta", replace
